Loading…
中国上海
2019 年 6 月 24–26 日
单击此处了解更多信息和注册

点击此处查看英文版日程表。
To view the English version of this schedule please go here.

我们将为所有主题演讲和分组会议提供同声传译服务。
Simultaneous translation will be provided for all keynote and breakout sessions.

场馆 + 赞助商展示区地图
Venue + Sponsor Showcase Map
Tuesday, June 25 • 16:00 - 16:35
Kubernetes 集群的大规模分布式深度学习 - Yuan Tang,蚂蚁金服;Yong Tang,MobileIron

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
本次演讲的重点是在 Kubernetes 上部署大规模分布式深度学习。此外,还将介绍如何通过使用运算符来管理和并实现机器学习训练过程自动化。我们将分享我们的经验,并比较两个开源 Kubernetes 运算符:tf-operator 和 mpi-operator。这两个运算符都为 TensorFlow 管理训练任务,但有着不同的分配策略,这就造成了 CPU、GPU 和网络利用率方面的不同性能结果。

深度学习任务既是网络密集型又是 GPU 密集型,因此对编排进行适当优化非常重要。易发的不平衡会导致闲置计算容量,这对于 GPU 节点来说成本太高昂了(与 CPU 相比)。我们将分享我们的经验,希望可提供有用的洞察,帮助从机器学习任务中获得更好的经济效益。

Speakers
avatar for Yuan Tang

Yuan Tang

Principal Software Engineer, Red Hat
Yuan is a principal software engineer at Red Hat, working on OpenShift AI. Previously, he has led teams to build AI infrastructure and platforms at various companies, including Alibaba and Akuity. He's a project lead of Argo and Kubeflow, a maintainer of TensorFlow and XGBoost, and... Read More →
avatar for Yong Tang

Yong Tang

Senior Director of Engineering, Ivanti
Yong Tang is Senior Director of Engineering at Ivanti. He is a core maintainer of CoreDNS and contributes to many container, cloud-native, and machine learning projects for the open source community. In addition to CoreDNS, he is a maintainer of Docker/Moby. He is also a maintainer... Read More →


Tuesday June 25, 2019 16:00 - 16:35 CST
620